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Glossary 

complex systems: systems with a large number of mutually interacting parts, often open 
to their environment, which self-organize their internal structure and their dynamics with 
novel and sometimes surprising macroscopic "emergent" properties. 

criticality (in physics): a state in which spontaneous fiuctuations of the order parameter 
occur at all scales, leading to diverging correlation length and susceptibility of the system 
to external infiuences 

power law distribution: a specific family of statistical distribution appearing as a straight 
line in a log- log plot; does not possess characteristic scales and exhibit the property of scale 
invariance. 



self-organized criticality: when the system dynamics is attracted spontaneously, without 
any obvious need for parameter tuning, to a critical state with infinite correlation length 
and power law statistics. 

stretched-exponential distribution: a specific family of sub-exponential distribution inter- 
polating smoothly between the exponential distribution and the power law family. 

1 Definition of the subject and its importance 

This Core article for the Encyclopedia of Complexity and System Science (Springer Science) 
reviews briefly the concepts underlying complex systems and probability distributions. The 
later are often taken as the first quantitative characteristics of complex systems, allowing 
one to detect the possible occurrence of regularities providing a step toward defining a clas- 
sification of the different levels of organization (the "universality classes"). A rapid survey 
covers the Gaussian law, the power law and the stretched exponential distributions. The 
fascination for power laws is then explained, starting from the statistical physics approach 
to critical phenomena, out-of-equilibrium phase transitions, self-organized criticality, and 
ending with a large but not exhaustive list of mechanisms leading to power law distribu- 
tions. A check-list for testing and qualifying a power law distribution from your data is 
described in 7 steps. This essay enlarges the description of distributions by proposing that 
"kings", i.e., events even beyond the extrapolation of the power law tail, may reveal an 
information which is complementary and perhaps sometimes even more important than 
the power law distribution. We conclude a list of future directions. 

2 Introduction 

2.1 Complex systems 

The study of out-of-equilibrium dynamics (e.g. dynamical phase transitions) and of hetero- 
geneous systems (e.g. spin-glasses) has progressively made popular in physics the concept 
of complex systems and the importance of systemic approaches: systems with a large 
number of mutually interacting parts, often open to their environment, self-organize their 
internal structure and their dynamics with novel and sometimes surprising macroscopic 
("emergent") properties. The complex system approach, which involves "seeing" inter- 
connections and relationships i.e. the whole picture as well as the component parts, is 
nowadays pervasive in modern control of engineering devices and business management. It 
is also plays an increasing role in most of the scientific disciplines, including biology (bi- 
ological networks, ecology, evolution, origin of life, immunology, neurobiology, molecular 
biology, etc), geology (plate-tectonics, earthquakes and volcanoes, erosion and landscapes, 
climate and weather, environment, etc.), economics and social sciences (including cogni- 
tion, distributed learning, interacting agents, etc.). There is a growing recognition that 
progress in most of these disciplines, in many of the pressing issues for our future welfare as 
well as for the management of our everyday life, will need such a systemic complex system 
and multidisciplinary approach. 
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A central property of a complex system is the possible occurrence of coherent large- 
scale collective behaviors with a very rich structure, resulting from the repeated non-linear 
interactions among its constituents: the whole turns out to be much more than the sum 
of its parts. Most complex systems around us exhibit rare and sudden transitions, that 
occur over time intervals that are short compared to the characteristic time scales of their 
posterior evolution. Such extreme events express more than anything else the underly- 
ing "forces" usually hidden by almost perfect balance and thus provide the potential for 
a better scientific understanding of complex systems. These crises have fundamental so- 
cietal impacts and range from large natural catastrophes such as earthquakes, volcanic 
eruptions, hurricanes and tornadoes, landslides, avalanches, lightning strikes, catastrophic 
events of environmental degradation, to the failure of engineering structures, crashes in 
the stock market, social unrest leading to large-scale strikes and upheaval, economic draw- 
downs on national and global scales, regional power blackouts, traffic gridlock, diseases and 
epidemics, etc. 

Given the complex dynamics of these systems, a first standard attempt to quantify and 
classify the characteristics and the possible different regimes consists in 

1. identifying discrete events, 

2. measuring their sizes, 

3. constructing their probability distribution. 

The interest in probability distributions in complex systems has the following roots. 

• They offer a natural metric of the relative rate of occurrence of small versus large 
events, and thus of the associated risks. 

• As such, they constitute essential components of risk assessment and prerequisites of 
risk management. 

• Their mathematical form can provide constraints and guidelines to identify the un- 
derlying mechanisms at their origin and thus at the origin of the behavior of the 
complex system under study. 

• This improved understanding may lead to better forecasting skills, and even to the 
option (or illusion (?)) of (a certain degree of) control [HIS]- 

2.2 Probability distributions 

Let us first fix some notations and vocabulary. Consider a process X whose outcome is 
a real number. The probability density function P{x) of X (pdf also called probability 
distribution) is such that the probability that X is found in a small interval Ax around x 
is P(a;)Ax. The probability that X is between a and h is therefore given by the integral of 
P{x) between a and h\ 



The pdf P{x) depends on the units used to quantity the variable x and has the dimension 
of the inverse of x, such that P(x) Ax, being a probability i.e. a number between and 1, 




(1) 
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is dimensionless. In a change of variable, say x y = f{x), the probabihty is invariant. 
Thus, the invariant quantity is the probabihty P{x)Ax and not the pdf P{x). We thus 
have 



leading to P{y) = P(a;)|d//dx|~^, taking the limit of infinitesimal intervals. By definition, 
P{x) > 0. It is normalized, J^™^^'' P{x)dx = 1, where Xmin and Xmax (often ±00) are the 
smallest and largest possible values for x, respectively. 

The empirical estimation of the pdf P{x) is usually plotted with the horizontal axis 
scaled as a graded series for the measure under consideration (the magnitude of the earth- 
quakes, etc.) and the vertical axis scaled for the number of outcomes or measures in each 
interval of horizontal value (the earthquakes of magnitude between 1 and 2, between 2 and 
3, etc.). This implies a "binning" into small intervals. If the data is sparse, the number of 
events in each bin becomes small and can fluctuate, leading to a poor representation of the 
data. In this case, it is useful to construct the cumulative distribution P<{x) defined by 



which is much less sensitive to fluctuations. V<{x) gives the fraction of events with values 
less than or equal to x. V<{x) increases monotonically with x from to 1. Similarly, 
we can define the so-called complementary cumulative (or survivor) distribution V>{x) = 



For random variables which take only discrete values Xi,X2, ■■■,Xn, the pdf is made 
of a discrete sum of Dirac functions {l/n)[6{x — xi) + 6{x — X2) + ... + S{x — Xn)]- The 
corresponding cumulative distribution function (cdf) V<{x) is a staircase. There are also 
more complex distributions made of continuous cdf but which are singular with respect 
to the Lebesgue measure dx. An example is the Cantor distribution constructed from the 
Cantor set (see for instance Chapter 5 in p]). Such singular cdf is continuous but has its 
derivative which is zero almost everywhere: the pdf does not exist (see e.g. [1]). 

2.3 Brief survey of probability distributions 

Statistical physics is rich with probability distributions. The most famous is the Boltzmann 
distribution, which describes the probability that the configuration of the system in thermal 
equilibrium has a given energy. Its extension to out-of-equilibrium systems is the subject of 
intense scrutiny [5] ; see also Chapter 7 of [3] and references therein. Special cases include 
the Maxwell-Boltzmann distribution, the Bose-Einstein distribution and the Fermi-Dirac 
distribution. 

In the quest to characterize complex systems, two distributions have played a leading 
role: the normal (or Gaussian) distribution and the power law distribution. The Gaussian 
distribution is the paradigm of the "mild" family of distributions. In contrast, the power 
law distribution is the representative of the "wild" family. The contrast between "mild" 
and "wild" is illustrated by the following questions. 

• What is the probability that someone has twice your height? Essentially zero! The 
height, weight and many other variables are distributed with "mild" pdfs with a well- 
defined typical value and relatively small variations around it. The Gaussian law is 
the archetype of "mild" distributions. 



P{x)Ax = P{y)Ay, 



(2) 




(3) 



l-V<{x). 
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• What is the probabihty that someone has twice your weahh? The answer of course 
depends somewhat on your weahh but in general, there is a non-vanishing fraction 
of the population twice, ten times or even one hundred times as wealthy as you are. 
This was noticed at the end of the last century by Pareto, after whom the Pareto 
law has been named, which describes the power law distribution of wealth [SI El, a 
typical example of "wild" distributions. 

2.3.1 The normal (or Gaussian) distribution 

The expression of the Gaussian probability density function of a random variable x with 
mean xq and standard deviation a reads 



The importance of the normal distribution as a model of quantitative phenomena in 
the natural and social sciences can be in large part attributed to the central limit theorem. 
Many measurements of physical as well as social phenomena can be well approximated by 
the normal distribution. While the mechanisms underlying these phenomena are often un- 
known, the use of the normal model can be theoretically justified by assuming that many 
small, independent effects are additively contributing to each observation. The Gaussian 
distribution is also justified as the most parsimonious choice in absence of information 
other than just the mean and the variance: it maximizes the information entropy among 
all distributions with known mean and variance. As a result of the central limit theorem, 
the normal distribution is the most widely used family of distributions in statistics and 
many statistical tests are based on the assumption of asymptotic normality of the data. In 
probability theory, the standard Gaussian distribution arises as the limiting distribution of 
a large class of distributions of random variables (with suitable centering and normaliza- 
tion) characterized by a finite variance, which is nothing but the statement of the central 
limit theorem (see e.g. Chapter 2 in [5]). 

At the beginning of the twenty-first century, when power laws are often taken as the 
hallmark of complexity, it is interesting to reflect on the fact that the previous giants of 
science in the eighteen and nineteen centuries (Halley, Laplace, Quetelet, Maxwell and so 
on) considered that the Gaussian distribution expressed a kind of universal law of nature 
and of society. In particular, the Belgian astronomer Adolphe Quetelet was instrumental in 
popularizing the statistical regularities discovered by Laplace in the frame of the Gaussian 
distribution, which influenced the likes of John Herschel and John Stuart Mill and led 
Comte to define the concept of "social physics." 

2.3.2 The power law distribution 

A probability distribution function P{x) exhibiting a power law tail is such that 



possibly up to some large limiting cut-off. The exponent fi (also referred to as the "index" ) 
characterizes the nature of the tail: for < 2, one speaks of a "heavy tail" for which the 




defined for — oo < a; < +oo 



(4) 




for X large , 



(5) 
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variance is theoretically not defined. The scale factor plays a role analogous for power 
laws to the role of the variance for Gaussian distributions (see e.g. Chapter 4 in [3]). In 
particular, it enjoys the additivity property: the scale factor of the distribution of the sum 
of several independent random variables, each with a distribution exhibiting a power law 
tail with the same exponent fi, is equal to the sum of the scale factors characterizing each 
distribution of each random variable in the sum. 
A more general form is 

L(x) 

P{x) oc — , for X large , (6) 

where L{x) is a slowly varying function defined by lim^^ooLitx) / L{x) = 1 for any finite 
t (typically, L{x) is a logarithm ln(x) or power of a logarithm such as (ln(x))" with n 
finite). In mathematical language, a function such as ([6]) is said to be "regularly varying." 
This more general form means that the power law regime is only an asymptotic statement 
holding as a better and better approximation as one considers larger and larger x values. 

Power laws obey the symmetry of scale invariance, that is, they verify the following 
defining property that, for an arbitrary real number A, there exists a real number 7 such 
that 

P{x) = -fP{Xx) , Vx (7) 

Obviously, 7 = A^+^. The relation ([7]) means that the ratio of the probabilities of occurrence 
of two sizes Xi and X2 depend only on their ratio Xxjx^ and not on their absolute values. 
For instance, according to the Zipf law (yU = 1) for the distribution of city sizes, the ratio of 
the number of cities with more that 1 million inhabitants to those with more than lOO'OOO 
persons is the same as the ratio of the number of cities with more than lOO'OOO inhabitants 
to those with more than lO'OOO persons, both ratios being equal to 1/10. The symmetry of 
scale invariance ([7]) extends to the space of functions the concept of scale invariance which 
characterizes fractal geometric objects. 

It should be stressed that, when they exhibit a power law-like shape, most empirical 
distributions do so only over a finite range of event sizes, either bounded between a lower 
and an upper cut-off [HI El [IHl E], or above a lower threshold, i.e., only in the tail of the 
observed distribution [121 [131 [T31 [15] . Power law distributions and more generally regularly 
varying distributions remain robust functional forms under a large number of operations, 
such as linear combinations, products, minima, maxima, order statistics, powers, which 
may also explain their ubiquity and attractiveness. Jessen and Mikosch [16j give the con- 
ditions under which transformations of power law distributions are also regularly varying, 
possibly with a different exponent (see also section 4.4 for an heuristic presentation of 
similar results). 

2.3.3 The Stretched exponential distribution 

The so-called stretched exponential (SE) distributions have been found to be a versatile 
intermediate distribution interpolating between "thin tail" (Gaussian, exponential,...) and 
very "fat tail" distributions. In particular, Laherrere and Sornette [TTj have found that 
several examples of fat-tailed distribution in the natural and social sciences, often consid- 
ered to be good examples of power laws, could be represented as well as or even better 
sometimes by a SE distribution. Malevergne et al. ^18j present systematic statistical tests 
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comparing the SE family with the power law distribution in the context of financial re- 
turn distributions. The SE family is defined by the following expression for the survival 
distribution (also called the complementary cumulative distribution function): 



'P>u{x) = 1 - exp 

The constant m is a lower threshold that can be changed to emphasize more the tail of 
the distribution as u is increased. The structural exponent c controls the "thin" versus 
"heavy" nature of the tail. 

1. For c = 2, the SE distribution (jS]) has the same asymptotic tail as the Gaussian 
distribution. 

2. For c = 1, expression ([8]) recovers the pure exponential distribution. 

3. For c < 1, the tail of Vu{x) is fatter than an exponential, and corresponds to the 
regime of sub-exponentials (see Chapter 6 in |3]). 

4. For c ^ with 

c-(^)°-M, (9) 
the SE distribution converges to the Pareto distribution with tail exponent /i. 
Indeed, we can write 

c / x^'-u'^X 



which is the pdf of the Pareto power law model with tail index /i. This implies that, as 
c 0, the characteristic scale d of the SE model must also go to zero with c to ensure its 
convergence towards the Pareto distribution. 

This shows that the Pareto model can be approximated with any desired accuracy 
on an arbitrary interval [u > 0, U) by the (SE) model with parameters (c, d) satisfying 
equation (Q where the arrow is replaced by an equality. The limit c — > provides any 
desired approximation to the Pareto distribution, uniformly on any finite interval {u,U). 
This deep relationship between the SE and power law models allows us to understand why 
it can be very difficult to decide, on a statistical basis, which of these models fits the data 
best fTR [T8] . This insight can be made rigorous to develop a formal statistical test of the 
(SE) hypothesis versus the Pareto hypothesis [HI [19]. 

From a theoretical view point, this class of distributions (j8]) is motivated in part by the 
fact that the large deviations of multiplicative processes are generically distributed with 
stretched exponential distributions [2D] . Stretched exponential distributions are also parsi- 
monious examples of the important subset of sub-exponentials, that is, of the general class 
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for a; > M 



(8) 



u\'^ x^ 



d 



exp 



/i ■ X exp 



/i ■ X exp 



-1^ -In- 
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X 



(10) 



of distributions decaying slower than an exponential [21]. This class of sub-exponentials 
share several important properties of heavy-tailed distributions [22], not shared by expo- 
nentials or distributions decreasing faster than exponentials: for instance, they have "fat 
tails" in the sense of the asymptotic probability weight of the maximum compared with 
the sum of large samples [1] (see also [3], Chapters. 1 and 6). 

Notwithstanding their fat-tailness. Stretched Exponential distributions have all their 
moments finite, in contrast with regularly varying distributions for which moments of order 
equal to or larger than the tail index fi are not defined. However, they do not admit an 
exponential moment, which leads to problems in the reconstruction of the distribution 
from the knowledge of their moments [23|. In addition, the existence of all moments is an 
important property allowing for an efficient estimation of any high-order moment, since 
it ensures that the estimators are asymptotically Gaussian. In particular, for Stretched- 
Exponentially distributed random variables, the variance, skewness and kurtosis can be 
accurately estimated, contrarily to random variables with regularly varying distribution 
with tail index smaller than about 5. 

3 The fascination with power laws 

Probability distribution functions with a power law dependence in terms of event or ob- 
ject sizes seem to be ubiquitous statistical features of natural and social systems. It has 
repeatedly been argued that such an observation relies on an underlying self-organizing 
mechanism, and therefore power laws should be considered as the statistical imprints of 
complex systems. It is often claimed that the observation of a power law relation in data 
often points to specific kinds of mechanisms at its origin, that can often suggest a deep 
connection with other, seemingly unrelated systems. In complex systems, the appearance 
of power law distributions is often thought to be the signature of hierarchy and robust- 
ness. In the last two decades, such claims have been made for instance for earthquakes, 
weather and climate changes, solar fiares, the fossil record, and many other systems, to 
promote the relevance of self-organized criticality as an underlying mechanism for the or- 
ganization of complex systems This claim is often unwarranted as there are many 
non-self-organizing mechanisms producing power law distributions [25l [261 131 HZ] • 

Research on the origins of power law relations, and efforts to observe and validate them 
in the real world, is extremely active in many fields of modern science, including physics, 
geophysics, biology, medical sciences, computer science, linguistics, sociology, economics 
and more. One can attempt to summarize briefiy the present understanding as follows. 

3.1 Statistical physics in general and the theory of critical phe- 
nomena 

The study of critical phenomena in statistical physics suggests that power laws emerge 
close to special critical or bifurcation points separating two different phases or regimes of 
the system. In systems at thermodynamic equilibrium modeled by general spin models, 
the renormalization group theory [28J has demonstrated the existence of universality, so 
that diverse systems exhibit the same critical exponents and identical scaling behavior as 
they approach criticality, i.e., they share the same fundamental macroscopic properties. For 
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instance, the behavior of water and CO2 at their boihng points at a certain critical pressure 
and that of a magnet at its Curie point fall in the same universality class because they 
can be characterized by the same order parameter in the same space dimension. In fact, 
almost all material phase transitions are described by a small set of universality classes. 

From this perspective, the fascination with power laws reflects the fact that they charac- 
terize the many coexisting and delicately interacting scales at a critical point. The existence 
of many scales leading to complex geometrical properties is often associated with fractals 
[T2] . While it is true that critical points and fractals share power law relations, power law 
relations and power law distributions are not the same. The later, which is the subject of 
this essay, describes the probability density function or frequency of occurrence of objects 
or events, such as the frequency of earthquakes of a given magnitude range. In contrast, 
power law relations between two variables (such as the magnetization and temperature in 
the case of the Curie point of a magnet) describe a functional abstraction belonging to or 
characteristic of these two variables. Both power law relations and power law distributions 
can result from the existence of a critical point. A simple example in percolation is (i) the 
power law dependence of the size of the larger cluster as a function of the distance from the 
percolation threshold and (ii) the power law distribution of cluster sizes at the percolation 
threshold [29]. 

3.2 Out-of-equilibrium phase transition and self-organized criti- 
cal systems (SOC) 

In the broadest sense, SOC refers to the spontaneous organization of a system driven 
from the outside into a globally stationary state, which is characterized by self-similar 
distributions of event sizes and fractal geometrical properties. This stationary state is 
dynamical in nature and is characterized by statistical fluctuations, which are generically 
refered to as "avalanches." 

The term "self-organized criticality" contains two parts. The word "criticality" refers to 
the state of a system at a critical point at which the correlation length and the susceptibility 
become infinite in the infinite size limit as in the preceding section. The label "self- 
organized" is often applied indiscriminately to pattern formation among many interacting 
elements. The concept is that the structuration, the patterns and large scale organization 
appear spontaneously. The notion of self-organization refers to the absence of control 
parameters. 

In this class of mechanisms, where the critical point is the attractor, the situation 
becomes more complicated as the number of universality classes proliferates. In particular, 
it is not generally well-understood why sometimes local details of the dynamics may change 
the macroscopic properties completely while in other cases, the university class is robust. 
In fact, the more we learn about complex out-of-equilibrium systems, the more we realize 
that the concept of universality developed for critical phenomena at equilibrium has to 
be enlarged to embody a more qualitative meaning: the critical exponents defining the 
universality classes are often very sensitive to many (but not all) details of the models [30] • 

Of course, one of the hailed hallmark of SOC is the existence of power law distributions 
of "avalanches" and of other quantities [211 EB E] • 
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3.3 Non- exhaustive list of mechanisms leading to power law dis- 
tributions 

There are many physical and/or mathematical mechanisms that generate power law dis- 
tributions and self-similar behavior. Understanding how a mechanism is selected by the 
microscopic laws constitute an active field of research. We can propose the following non- 
exhausive list of mechanisms that have been found to be operating in different complex 
systems, and which can lead to power law distribution of avalanches or cluster sizes. For 
most of these mechanisms, we refer the reader to [3j [Chapters 14 and 15] and to [321 [27] 
for detailed explanations and the relevant bibliography. However, some of the mechanisms 
mentioned here have not been reviewed in these three references and are thus new to the 
list developed in particular in [3]. We should also stress that some of the mechanisms in 
this list are actually different incarnations of the same underlying idea (for instance pref- 
erential attachment which is a re-discovery of the Yule process, see [33] for an informative 
historical account). 

1. percolation, fragmentation and other related processes, 

2. directed percolation and its universality class of so-called "contact processes", 

3. cracking noise and avalanches resulting from the competition between frozen disor- 
der and local interactions, as exemplified in the random field Ising model, where 
avalanches result from hysteretic loops [M] . 

4. random walks and their properties associated with their first passage statistics [35] 
in homogenous as well as in random landscapes, 

5. fiashing annihilation in Verhulst kinetics [36j . 

6. sweeping of a control parameter towards an instability [25| [37]. 

7. proportional growth by multiplicative noise with constraints (the Kesten process [3S] 
and its generalization for instance in terms of generalized Lotka-Volterra processes 
[39] . whose ancestry can be traced to Simon and Yule, 

8. competition between multiplicative noise and birth-death processes [40] . 

9. growth by preferential attachment [32], 

10. exponential deterministic growth with random times of observations (which gives the 
Zipf law) [41], 

11. constrained optimization with power law constraints (HOT for highly optimized tol- 
erant), 

12. control algorithms, which employ optimal parameter estimation based on past obser- 
vations, have been shown to generate broad power law distributions of fiuctuations 
and of their corresponding corrections in the control process [42l H3] , 

13. on-off intermittency as a mechanism for power law pdf of laminar phases [^ B5] . 
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14. self-organized criticality which comes in many flavors as explained in Chapter 15 of 



• cellular automata sandpiles with and without conservation laws, 

• systems made of coupled elements with threshold dynamics, 

• critical de-synchronization of coupled oscillators of relaxation, 

• nonlinear feedback of the order parameter onto the control parameter 

• generic scale invar iance, 

• mapping onto a critical point, 

• extremal dynamics. 

If there is one lesson to extract from this impressive list, it is that, when observing an 
approximate linear trend in the log-log plot of some data distribution, one should refrain 
from jumping to hasty conclusions on the implications of this approximate power law 
behavior. Another lesson is that power laws appear to be so ubiquitous perhaps because 
many roads lead to them! 

4 Testing for power law distributions in your data 

Although power law distributions are attractive for their simplicity (they are straight lines 
on log-log plots) and may be justifled from theoretical reasons as discussed above, demon- 
strating that data do indeed follow a power law distribution requires more than simply 
fltting. Indeed, several alternative functional forms can appear to follow a power law 
form over some extent, such as stretched exponentials and log-normal distributions. Thus, 
validating that a given distribution is a power law is not easy and there is no silver bullet. 

Clauset et al. [16] have recently summarized some statistical techniques for making 
accurate parameter estimates for power-law distributions, based on maximum likelihood 
methods and the Kolmogorov-Smirnov statistic. They illustrate these statistical methods 
on twenty-four real-world data sets from a range of different disciplines. In some cases, 
they flnd that power laws are consistent with the data while in others the power law is 
ruled out. 

Here, we offer some advices for the characterization of a power law distribution as 
a possible adequate representation of a given data set. We emphasize good sense and 
practical aspects. 

1. Survivor distribution. First, the survival distribution should be constructed using 
the raw data by ranking the values in increasing values. Then, rank versus values 
gives immediately a non-normalized survival distribution. The advantage of this 
construction is that it does not require binning or kernel estimation, which is a 
delicate art, as we have alluded to. 

2. Probability density function. The previous construction of the complementary 
cumulative (or survivor) distribution function should be complemented with that of 
the density function. Indeed, it is well-known that the cumulative distribution, being 
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a "cumulative" integral of the density function as its name indicates, may be con- 
taminated by disturbances at one end of the density function, leading to rather long 
cross-overs that may hide or perturb the power law. For instance, if the generating 
density distribution is a power law truncated by an exponential, as found for critical 
systems not exactly at their critical point or in the presence of finite-size effects |47] . 
the power law part of the cumulative distribution will be strongly distorted leading 
to a spurious estimation of the exponent fi. This problem can be in large part al- 
leviated by constructing the pdf using binning or, even better, kernel methods (see 
the very readable article [48] and references therein). By testing and comparing the 
survival and the probability density distributions, one obtains either a confirmation 
of the power law scaling or an understanding of the origin(s) of the deviations from 
the power law. 

3. Structural analysis by visual inspection. Given that these first two steps have 
been performed, we recommend a preliminary visual exploration by plotting the 
survival and density distributions in (i) linear-linear coordinates, (ii) log-linear coor- 
dinates (linear abscissa and logarithmic ordinate) and (iii) log-log coordinates (log- 
arithmic abscissa and logarithmic ordinate). The visual comparison between these 
three plots provides a fast and intuitive view of the nature of the data. 

• A power law distribution will appear as a convex curve in the linear-linear and 
log-linear plots and as a straight line in the log-log plot. 

• A Gaussian distribution will appear as a bell-shaped curve in the linear-linear 
plot, as an inverted parabola in the log-linear plot and as strongly concave 
sharply falling curve in the log- log plot. 

• An exponential distribution will appear as a convex curve in the linear-linear 
plot, as a straight line in the log-linear plot and as a concave curve in the log- log 
plot. 

Having in mind the shape of these three reference distributions in these three repre- 
sentations provides fast and useful reference points to classify the unknown distribu- 
tion under study. For instance, if the log-linear plot shows a convex shape (upward 
curvature), we can conclude that the distribution has a tail fatter than an exponen- 
tial. Then, the log-log plot will confirm if a power law is a reasonable description. 
If the log-log plot shows a downward curvature (concave shape), together with the 
first information that the log-linear plot shows a convex shape, we can conclude that 
the distribution has a tail fatter than an exponential but thinner than a power law. 
For example, it could be a gamma distribution (~ exp[— x/xq] with n > 0) or a 
stretched distribution (expression (jS]) with c < 1). Only more detailed quantitative 
analysis will allow one to refine the diagnostic, often with not definite conclusions 
(see as an illustration the detailed statistical analysis comparing the power law to the 
stretched exponential distributions to describe the distribution of financial returns 

my 

The deviations from linearity in the log-log plot suggest the boundaries within which 
the power law regime holds. We said "suggest" as a visual inspection is only a first 
step, which can be actually misleading. While we recommend a first visual inspection, 
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it is only a first indication, not a proof. It is a necessary step to convince oneself 
(and the reviewers and journal editors) but certainly not a sufficient condition. It is a 
standard rule of thumb that a power law scaling is thought to be meaningful if it holds 
over at least two to three decades on both axes and is bracketed by deviations on 
both sides whose origins can be understood (for instance, due to insufficient sampling 
and/or finite-size effects). 

As an illustration of the potential errors stemming from visual inspection, we refer 
to the discussion of Sornette et al., |49j on the claim of Pacheco et al. [50J of the 
existence of a break in the Gutenberg-Richter distribution of earthquake magnitudes 
at m = 6.4 for California. This break was claimed to reveal the finiteness of the 
crust thickness according to Pacheco et al.[1992]. This claim has subsequently been 
shown to be unsubstantiated, as the Gutenberg-Richter law (which is a power law 
when expressed in earthquake energies or seismic moments) seems to remain valid 
up to magnitudes of 7.5 in California and up to magnitude about 8 — 8.5 worldwide. 
This visual break at m = 6.4 turned out to be just a statistical deviation, completely 
expected from the nature of power law fluctuations [SH [T5] . 

4. OLS fitting. The next step is often to perform an OLS (ordinary least-square) 
regression of the data (survival distribution or kernel-reconstructed density) in the 
logarithm of the variables, in order to estimate the parameters of the power law. 
These parameters are the exponent /i, the scale factor and possibly an upper 
threshold or other parameters controlling the cross-over to other behaviors outside the 
scaling regime. Using logarithms ensures that all the terms in the sum of squares over 
the different data points contribute approximately similarly in the OLS. Otherwise, 
without logarithms, given the large range of values spanned by a typical power law 
distribution, a relative error of say 1% around a value of the order of 10^ would 
have a weight in the sum ten thousand times larger than the weight due to the same 
relative error of 1% around a value of the order of 10^, biasing the estimation of the 
parameters towards fitting preferentially the large values. In addition, in logarithm 
units, the estimation of the exponent of a power law constitutes a linear problem 
which is solved analytically. 

5. Maximum likelihood estimation. Using an OLS method to estimate the param- 
eters of a power law assumes implicitly that the distribution of the deviations from 
the power law (actually the difference between the logarithm of the data and the 
logarithm of the power law distribution) are normally distributed. This may not be 
a suitable approximation. An estimation which removes this assumption consists in 
using the likelihood method, in which the parameters of the power law are chosen 
so as to maximize the likelihood function. When the data points are independent, 
the likelihood function is nothing but the product Hi^i P{xi) over the data points 
xi,X2, ...,xn of the power law distribution P{x). In this case, the exponent /x which 
maximizes this likelihood (or equivalently and more conveniently its logarithm called 
the log-likelihood) is called the Hill estimator [521. It reads 

(11) 



u n 



Xj 
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where Xmin is the smallest value among the n values used in the data set for the 
estimation of /i. Since power laws are often asymptotic characteristics of the tail, 
it is appropriate not use the full data set but only the upper tail with data values 
above a lower threshold. Then, plotting or /i as a function of the lower threshold 
usually provides a good sense of the existence of a power law regime: one should 
expect an approximate stability of l//i over some scaling regime. Note that the Hill 
estimator provides an unbiased estimate of l//i while n obtained by inverting is 
slightly biased (see e.g. Chapter 6 in [3]). We refer to [SSj [51] for improved versions 
and procedures of the Hill estimator which deal with finite ranges and dependence. 

6. Non-parametric methods. Methods testing for a power law behavior in a given 
empirical distribution which are not parametric and sensitive provide useful comple- 
ments of the above fitting and parametric estimation approaches. Pisarenko et al. 
[55] and Pisarenko and Sornette |56] have developed a new statistics, such that a 
power law behavior is associated with a zero value of the statistics independently of 
the numerical value of the exponent /i and with a non-zero value otherwise. Plot- 
ting this statistics as a function of the lower threshold of the data sample allows 
one to detect subtle deviations from a pure power law. Lasocki [57] and Lasocki 
and Papadimitriou [58] have developed another non-parametric approach to detect 
deviations from a power law, the smoothed bootstrap test for multimodality which 
makes it possible to test the complexity of the distribution without specifying any 
particular probabilistic model. The method relies on testing the hypotheses that the 
number of modes or the number of bumps exhibited by the distribution function 
equal to 1. Rejection of one of these hypotheses indicate that the distribution has 
more complexity than described by a simple power law. 

Once the evidence for a power law distribution has been reasonably demonstrated, the 
most difficult task remains: finding a mechanism and model which can explain the data. 
Note that the term "explain" refers to different meanings depending on whose expert you 
are speaking to. For a statistician, having been unable to reject the power law function 
given the data amounts to say that the power law model "explains" the data. The emphasis 
of the statistician will be on refining parametric and non-parametric procedures to test the 
way the power law "fits" or deviates from the empirical data. In contrast, a physicist or 
a natural scientist sees this only as a first step, and attributes the word "explain" to the 
stage where a mechanism in terms of a more fundamental process or first principles can 
derive the power law. But even among natural scientists, there is no consensus on what is a 
suitable "explanation." The reason stems from the different cultures and levels of study in 
different fields, well addressed in the famous paper "More is different" of Anderson |59j : a 
suitable explanation for a physicist will frustrate a chemist who herself will make unhappy 
a biologist. Each scientific discipline is anchored in a more fundamental scientific level 
while having developed its specific concepts, which provide the underpinning for the next 
scientific level of description (think for instance of the hierarchy: physics — > chemistry — > 
molecular biology cell biology animal biology ethology sociology economics 



Once a model at a given scientific description level has been proposed, the action of the 
model on inputs gives outputs which are compared with the data. Verifying that the model, 
inspired by the preliminary power law evidence, adequately fits this power law is a first 
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step. Unfortunately, much too often, scientists stop there and are happy to report that they 
have a model that fits their empirical power law data. This is not good science. Keeping 
in mind the many possible mechanisms at the origin of power law distributions reviewed 
above, a correct procedure is to run the candidate model to get other predictions that can 
themselves be put to test. This validation is essential to determine the degree to which the 
model is an accurate representation of the real world from the perspective of its intended 
uses. Reviewing a large body of literature devoted to the problem of validation, Sornette 
et al. [6^ have proposed a synthesis in which the validation of a given model is formulated 
as an iterative construction process that mimics the often implicit process occurring in the 
minds of scientists. Validation is nothing but the progressive build-up of trust in the model, 
based on putting the model to test against non-redundant novel experiments or data, that 
allows one to make a decision and act decisively. The applications of the validation program 
to a cellular automaton model for earthquakes, to a multifractal random walk model for 
financial time series, to an anomalous diffusion model for solar radiation transport in the 
cloudy atmosphere, and to a computational fluid dynamics code for the Richtmyer-Meshkov 
instability, exemplify the importance of going beyond the simple qualiflcation of a power 
law. 

5 Beyond power laws: "Kings" 

5.1 The standard view 

Power law distributions incarnate the notion that extreme events are not exceptional 9- 
sigma events (to refer to the terminology using the Gaussian bell curve and its standard 
deviation a as the metric to quantify deviations from the mean). Instead, extreme events 
should be considered as rather frequent and part of the same organization as the other 
events. In this view, a great earthquake is just an earthquake that started small ... and 
did not stop; it is inherently unpredictable due to its sharing of all the properties and 
characteristics of smaller events (except for its size), so that no genuinely informative 
precursor can be identifled [61]. This is the view expounded by Bak and co-workers in 
their formulation of the concept of self-organized criticality j62l [2^ . In the following, we 
outline several promising directions of research that expand on these ideas. 

5.2 Self-organized criticality versus criticality 

However, there are many suggestions that this does not need to be the case. One argument 
is that criticality and self-organized criticality (SOC) can actually co-exist. The hallmark 
of criticality is the existence of speciflc precursory patterns (increasing susceptibility and 
correlation length) in space and time. Continuing with the example of earthquakes, the 
idea that a great earthquake could result from a critical phenomenon has been put forward 
by different groups, starting almost three decades ago [HSIIMIES]- Attempts to link earth- 
quakes and critical phenomena flnd support in the evidence that rupture in heterogeneous 
media is similar to a critical phenomenon (see Chapter 13 of p] and references therein). 
Also indicative is the often reported observation of increased intermediate magnitude seis- 
micity before large events [551 157] . An illustration of the coexistence of criticality and of 
SOC is found in a simple sandpile model of earthquakes on a hierarchical fault structure 
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[68] . Here, the important ingredient is to take into account both the nonhnear dynamics 
and the complex geometry. From the point of view of self-organized criticality, this is 
surprising news: large earthquakes do not lose their identity. In the model of Huang et al. 
[68j . a large earthquake is different from a small one, a very different story than the one 
told by common SOC wisdom in which any precursory state of a large event is essentially 
identical to a precursory state of a small event and an earthquake does not know how 
large it will become. The difference comes from the absence of geometry in standard SOC 
models. Reintroducing geometry is essential. In models with hierarchical fault structures, 
one finds a degree of predictability of large events. 

5.3 Beyond power laws: five examples of "kings" 

Are power laws the whole story? The following examples suggest that some extreme events 
are even "wilder" than predicted by the extrapolation of the power law distributions. 
They can be termed "outliers" or even better "kings" [17j. According to the definition 
of the Engineering Statistical Handbook [69j, "An outlier is an observation that lies an 
abnormal distance from other values in a random sample from a population." Here, we 
follow Laherrere and Sornette [17| and use the term "king" to refer to events which are 
even beyond the extrapolation of the fat tail distribution of the rest of the population. 

• Material failure and rupture processes. There is now ample evidence that the 
distribution of damage events, for instance quantified by the acoustic emission radi- 
ated by micro- cracking in heterogeneous systems, is well-described by a Gutenberg- 
Richter like power law ^70l [TH [72l [73] . But consider now the energy released in the 
final global event rupturing the system in pieces! This release of energy is many 
many times larger than the largest ever recorded event in the power law distribu- 
tion. Material rupture exemplifies the co-existence of a power law distribution and a 
catastrophic event lying beyond the power law. 

• Gutenberg-Richter law and characteristic earthquakes. In seismo-tectonics, 
the situation is muddy because of the difficulties with defining unambiguously the 
spatial domain of influence of a given fault. The researchers who have delineated a 
spatial domain surrounding a clearly mapped large fault claim to find a Gutenberg- 
Richter distribution up to a large magnitude region characterized by a bump or 
anomalous rate of large earthquakes. These large earthquakes have rupture lengths 
comparable with the fault length [7H [75] . If proven valid, this concept of a character- 
istic earthquake provides another example in which a "king" coexists with a power 
law distribution of smaller events. Others have countered that this bump disappears 
when removing the somewhat artificial partition of the data [76l[77] . so that the char- 
acteristic earthquake concept may be a statistical artifact. In this view, a particular 
fault may appear to have characteristic earthquakes, but the stress-shedding region, 
as a whole, behaves according to a pure scale-free power law distribution. 

Several theoretical models have been offered to support the idea that, in some seismic 
regimes, there is a coexistence between a power law and a large size regime (the "king" 
effect). Gil and Sornette [78] reported that this occurs when the characteristic rate 
for local stress relaxations is fast compared with the diffusion of stress within the 
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system. The interplay between dynamical effects and heterogeneity has also been 
shown to change the Gutenberg-Richter behavior to a distribution of small events 
combined with characteristic system size events [79l |80l [81], [82j. On the empirical 
side, progress should be made in testing the characteristic earthquake hypothesis 
by using the prediction of the models to identify independently of seismicity those 
seismic regions in which the king effect is expected. This remains to be done [Ben- 
Zion, private communication, 2007]. 

• Extreme king events in the pdf of turbulent velocity fluctuations. The 

evidence for kings does not require and is not even synonymous in general with the 
existence of a break or of a bump in the distribution of event sizes. This point is 
well-illustrated in the shell models of turbulence, that are believed to capture the 
essential ingredient of these flows, while being amenable to analysis. Such "shell" 
models replace the three-dimensional spatial domain by a series of uniform onion- 
like spherical layers with radii increasing as a geometrical series 1, 2, 4, 8, 2" and 
communicating with each other mostly with nearest neighbors. The quantity of 
interest is the distribution of velocity variations between two instants at the same 
position or between two points simultaneously. L'vov et al. ^83j have shown that 
they could collapse the pdf's of velocity fluctuations for different scales only for the 
small velocity fluctuations, while no scaling held for large velocity fluctuations. The 
conclusion is that the distributions of velocity increments seem to be composed of 
two regions, a region of so-called "normal scaling" and a domain of extreme events. 
They could also show that these extreme fluctuations of the fluid velocity correspond 
to intensive peaks propagating coherently (like solitons) over several shell layers with 
a characteristic bell-like shape, approximately independent of their amplitude and 
duration (up to a rescaling of their size and duration). One could summarize these 
findings by saying that "characteristic" velocity pulses decorate an otherwise scaling 
probability distribution function. 

• Outliers and kings in the distribution of financial drawdowns. In a series of 
papers, Johansen and Sornette [831 ESI [86] have shown that the distribution of draw- 
downs in financial markets exhibits this coexistence of a fat tail with a characteristic 
regime with "kings" (called "outliers" in the papers). The analysis encompasses ex- 
change markets (US dollar against the Deutsch Mark and against the Yen), the major 
world stock markets, the U.S. and Japanese bond markets and commodity markets. 
Here, drawdowns are defined as a continuous decrease in the value of the price at 
the close of each successive trading day. The results are found robust with using 
"coarse-grained drawdowns," which allows for a certain degree of fuzziness in the 
definition of cumulative losses. Interestingly, the pdf of returns at a fixed time scale, 
usually the daily returns, do not exhibit any anomalous king behavior in the tail: 
the pdf of financial returns at fixed time scales seem to be adequately described by 
power law tails ^87j. The interpretation proposed by Johansen and Sornette is that 
these drawdown kings are associated with crashes, which occur due to a global insta- 
bility of the market which amplifies the normal behavior via strong positive feedback 
mechanisms |88] . 

• Paris as the king in the Zipf distribution of French city sizes. Since Zipf [SS], 
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it is well-documented that the distribution of city sizes (measured by the number of 
inhabitants) is, in many countries, a power law with an exponent /x close to 1. France 
is not an exception as it exhibits a nice power law distribution of city sizes... except 
for Paris which is completely out of range, a genuine king with a size several times 
larger than expected from the distribution of the rest of the population of cities [T7] . 
This king effect reveals a particular historical organization of France, whose roots are 
difficult to unravel. Nevertheless, we think that this king effect incarnated by Paris is 
a significant signal to explain in order to understand the competition between cities 
in Europe. 

5.4 Kings and crises in complex systems 

We propose that these kings may reveal an information which is complementary and per- 
haps sometimes even more important than the power law pdf. 

Indeed, it is essential to realize that the long-term behavior of these complex systems is 
often controlled in large part by these rare catastrophic events: the universe was probably 
born during an extreme explosion (the "big-bang"); the nucleosynthesis of all important 
atomic elements constituting our matter results from the colossal explosion of supernovae; 
the largest earthquake in California repeating about once every two centuries accounts for 
a significant fraction of the total tectonic deformation; landscapes are more shaped by the 
"millenium" fiood that moves large boulders than by the action of all other eroding agents; 
the largest volcanic eruptions lead to major topographic changes as well as severe climatic 
disruptions; evolution is characterized by phases of quasi-statis interrupted by episodic 
bursts of activity and destruction; financial crashes can destroy in an instant trillions of 
dollars; political crises and revolutions shape the long-term geopolitical landscape; even 
our personal life is shaped on the long run by a few key "decisions/happenances". 

The outstanding scientific question is thus how such large-scale patterns of catastrophic 
nature might evolve from a series of interactions on the smallest and increasingly larger 
scales. In complex systems, it has been found that the organization of spatial and temporal 
correlations do not stem, in general, from a nucleation phase diffusing across the system. 
It results rather from a progressive and more global cooperative process occurring over the 
whole system by repetitive interactions. An instance would be the many occurrences of 
simultaneous scientific and technical discoveries signaling the global nature of the maturing 
process. 

Standard models and simulations of scenarii of extreme events are subject to numerous 
sources of error, each of which may have a negative impact on the validity of the predictions 
[90] . Some of the uncertainties are under control in the modelling process; they usually 
involve trade-offs between a more faithful description and manageable calculations. Other 
sources of errors are beyond control as they are inherent in the modeling methodology of the 
specific disciplines. The two known strategies for modelling are both limited in this respect : 
analytical theoretical predictions are out of reach for most complex problems, while brute 
force numerical resolution of the equations (when they are known) or of scenarii is reliable 
in the "center of the distribution", i.e. in the regime far from the extremes where good 
statistics can be accumulated. Crises are extreme events that occur rarely, albeit with 
extraordinary impact, and are thus completely under-sampled and poorly constrained. 
Even the introduction of terafiop (or even pentafiops in the near futur) supercomputers 
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does not change qualitatively this fundamental limitation. 

Recent developments suggest that non-traditional approaches, based on the concepts 
and methods of statistical and nonlinear physics could provide a middle way to direct the 
numerical resolution of more realistic models and the identification of relevant signatures 
of impending catastrophes. Enriching the concept of self-organizing criticality, the pre- 
dictability of crises would then rely on the fact that they are fundamentally outliers, e.g. 
large earthquakes are not scaled-up versions of small earthquakes but the result of specific 
collective amplifying mechanisms. To address this challenge, the available theoretical tools 
comprise in particular bifurcation and catastrophe theories, dynamical critical phenomena 
and the renormalization group, nonlinear dynamical systems, and the theory of partially 
(spontaneously or not) broken symmetries. Some encouraging results have been gathered 
on concrete problems, such as the prediction of the failure of complex engineering struc- 
tures, the detection of precursors of stock market crashes and of human parturition, with 
exciting potential for earthquakes. At the beginning of the third millenium, it is tempting to 
extrapolate and forecast that a larger multidisciplinary integration of the physical sciences 
together with artifical inteUigence and soft-computational techniques, fed by analogies and 
fertilization accross the natural sciences, will provide a better understanding of the limits 
of predictability of catastrophes and adequate measures of risks for a more harmonious and 
sustainable futur of our complex world. 

6 Future Directions 

Our exposition has mainly focused on the concept of distributions of event sizes, as a first 
approach to characterize the organization of complex systems. But, probability distribution 
functions are just one-point statistics and thus provide only an incomplete picture of the 
organization of complex systems. This opens the road to several better measures of the 
organization of complex systems. 

• Statistical estimations of probability distribution functions is a delicate art. An active 
research field in mathematical statistics which is insufficiently used by practitioners 
of other sciences is the domain of "robust estimation." Robust estimation techniques 
are methods which are insensitive to small departures from the idealized assumptions 
which have been used to optimize the algorithm. Such techniques include M-estimates 
(which follow from maximum likelihood considerations), L-estimates (which are linear 
combinations of order statistics), and R-estimates (based on statistical rank tests) 

• Ideally, one would like to measure the full multivariate distribution of events, which 
can be in full generality decomposed into the set of marginal distributions discussed 
above and of the copula of the system. A copula embodies completely the entire de- 
pendence structure of the system [9ll|95j. Copulas have recently become fashionable 
in financial mathematics and in financial engineering [19]. Their use in other fields 
in the natural sciences is embryonic but can be expected to blossom. 

• When analyzing a complex system, a common trap is to assume without critical 
thinking and testing that the statistics is stationary, so that monovariate (marginal) 
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and multivariate distribution functions are sufficient to fully characterize the system. 
It is indeed a common experience that the dependence estimated and predicted by 
standard models change dramatically at certain times. In other words, the statistical 
properties are conditional on specific regimes. The existence of regime-dependent 
statistical properties has been discussed in particular in climate science, in medical 
sciences and in financial economics. In the later, a quite common observation is that 
investment strategies, which have some moderate beta (coefficient of regression to 
the market) for normal times, can see their beta jumps to a much larger value (close 
to 1 or larger depending on the leverage of the investment) at certain times when 
the market collectively dives. Said differently, investments which are thought to be 
hedged against negative global market trends may actually lose as much or more than 
the global market, at certain times when a large majority of stocks plunge simulta- 
neously. In other words, the dependence structure and the resulting distributions at 
different time scales may change in certain regimes. 

The general problem of the application of mathematical statistics to non- stationary 
data (including non-stationary time series) is very important, but alas, not much can 
be done. There are only a few approaches which may be used and only in specific 
conditions, which we briefly mention. 

1. Use of algorithms and methods which are robust with respect to possible non- 
stationarity in data, such as normalization procedures or the use of quantile 
samples instead of initial samples. 

2. Model non-stationarity by some low-frequency random processes, such as, e.g., 
a narrow-band random process X(t) = A(t) cos {ut + 4>(t)) where <^ 1 and 
A{t) and phase 0(t) are slowly varying amplitude and phase. In this case, the 
Hilbert transform can be very useful to characterize (f){t) non-parametrically. 

3. The estimation of the parameters of a low-frequency process based on a "short" 
realization is often hopeless. In this case, the only quantity which can be eval- 
uated is the uncertainty (or scatter) of the results due to the non-stationarity. 

4. Regime Switching popularized by Hamilton [96] for autoregressive time series 
models is a special case of non-stationary, which can be handled with specific 
methods. 

• We already discussed the problem of "kings." One key issue that needs more scrutiny 
is that these outliers are often identified only with metrics adapted to take into 
account transient increases of the time dependence in the time series of returns of 
individual financial assets |85] (see also Chap. 3 of [88] ). These outliers seem to belong 
to a statistical population which is different from the bulk of the distribution and 
require some additional amplification mechanisms active only at special times. The 
presence of such outliers both in marginal distributions and in concomitant events, 
together with the strong impact of crises and of crashes in complex systems, suggests 
the need for novel measures of dependence between different definitions of events 
and other time-varying metrics across different variables. This program is part of 
the more general need for a joint multi-time-scale and multi-variate approach to the 
statistics of complex systems. 
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• The presence of outliers poses the problem of exogeneity versus endogeneity. An event 
identified as anomalous could perhaps be cataloged as resulting from exogenous in- 
fluences. The concept of exogeneity is fundamental in statistical estimation [97t 198] . 
Here, we refer to the question of exogeneity versus endogeneity in the broader con- 
text of self-organized criticality, inspired in particular from the physical and natural 
sciences. As we already discussed, according to self-organized criticality, extreme 
events are seen to be endogenous, in contrast with previous prevailing views (see for 
instance the discussion in [621 [99]). But, how can one assert with 100% confidence 
that a given extreme event is really due to an endogenous self-organization of the 
system, rather than to the response to an external shock? Most natural and social 
systems are indeed continuously subjected to external stimulations, noises, shocks, 
solicitations, forcing, which can widely vary in amplitude. It is thus not clear a priori 
if a given large event is due to a strong exogenous shock, to the internal dynamics of 
the system, or maybe to a combination of both. Addressing this question is funda- 
mental for understanding the relative importance of self-organization versus external 
forcing in complex systems and underpins much of the problem of dependence be- 
tween variables. The concepts of endogeneity and exogeneity have many applications 
in the natural and social sciences (see [100] for a review) and we expect this view 
point to develop into a general strategy of investigation. 
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